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Abstract. Bipartite networks are a useful tool for representing and investigating interaction net- 
works. We consider methods for identifying communities in bipartite networks. Intuitive notions of 
network community groups are made explicit using Newman's modularity measure. A specialized 
version of the modularity, adapted to be appropriate for bipartite networks, is presented; a corre- 
sponding algorithm is described for identifying community groups through maximizing this mea- 
sure. The algorithm is applied to networks derived from the EU Framework Programs on Research 
c/3 \ and Technological Development. Community groups identified are compared using information- 

^ . theoretic methods. 
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INTRODUCTION 



Networks have attracted a burst of attention in the last decade (useful reviews include 
Refs. [1-4]), with applications to natural, social, and technological networks. Within bi- 
ology, networks are prevalent, including: neural networks, where synapses link neurons; 
i/^ \ metabolic networks, describing metabolic processes in the cell, linking chemical reac- 

tions and the regulatory processes that control them; protein interaction networks, rep- 
resenting physical interactions between an organism's proteins; transcription networks, 
describing regulatory interactions between different genes; food webs, using links to 
characterize who eats whom; and networks of sexual relations and infections, including 
AIDS models. Taking a broader view, networks seem to be everywhere! There are: elec- 
trical power grids, whose stability relates to the network structure; airline networks, with 
service efficiency tied to properties of the network; the World Wide Web, with search en- 
gines using the network links to locate pages; networks in linguistics, with words linked 
by co-occurrence; social networks of all sorts [5-7]; collaboration networks, describing 
joint works amongst actors, authors, research labs, . . . ; and many more. 

Of great current interest is the identification of community groups, or modules, within 
networks. Stated informally, a community group is a portion of the network whose 
members are more tightly linked to one another than to other members of the network. A 
variety of approaches [8-16] have been taken to explore this concept; see Refs. [17, 18] 
for useful reviews. Detecting community groups allows quantitative investigation of 
relevant subnetworks. Properties of the subnetworks may differ from the aggregate 
properties of the network as a whole, e.g., modules in the World Wide Web are sets 
of topically related web pages. 

Methods for identifying community groups can be specialized to distinct classes 
of networks, such as bipartite networks [19, 20]. The nodes in a bipartite network 



can be partitioned into two disjoint sets such that no two nodes within the same set 
are adjacent. Bipartite networks thus feature two distinct types of nodes, providing a 
natural representation for many affiliation or interaction networks, with one type of 
node representing actors and the other representing relations. Examples of actor-relation 
pairs include people attending events [21-23], court justices making decisions [23], 
scientists jointly publishing articles [24, 25], organizations collaborating in projects 
[26, 27], and legislators serving on committees [28]. Arguably, bipartite networks are 
the empirically standard case for social networks and other interaction networks, with 
unipartite networks appearing — often implicitly — as projections. 



We formally describe networks using the language of graph theory. Let V be a set of 
vertices and E be a set of vertex pairs or edges from V xV. The pair G = (V,E) is called 
a graph. In a simple graph, all pairs {«, v} € E axe distinct and {w, u} £ E, i.e., there are 
no double lines or loops. Given a partition 



where no edges exist between pairs of points within Vu i = 1 or 2, then G is said to be 
bipartite. 

We shall consider simple graphs on a (large) finite set V: 



COMMUNITY STRUCTURE IN NETWORKS 



A Few Words on Graphs 



V = v l +v 2 
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2\E\ is often called the volume of the graph G. 

Graph structure is encoded in the adjacency matrix 
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Modularity 



Detecting community groups allows for the identification and quantitative investi- 
gation of relevant subnetworks. Local properties of the community groups may differ 



from the global properties of the complete network. For example, topically related web 
pages in the World Wide Web are typically interlinked, so that the contents of pages in 
distinct community groups should reveal distinct themes. Thus, identification of com- 
munity groups within a network is a first step towards understanding the heterogeneous 
substructures of the network. 

To identify communities, we take as our starting point the modularity, introduced 
by Newman and Girvan [14]. Modularity makes intuitive notions of community groups 
precise by comparing network edges to those of a null model. As noted by Newman 
[29]: 

A good division of a network into communities is not merely one in which 
there are few edges between communities; it is one in which there are fewer 
than expected edges between communities. 

Definition 1 The modularity Q is — up to a normalization constant — the number of edges 
within communities c minus those for a null model: 



fi=2^rEE( A y-^) • ^ 

- c ijec 



Along with eq. (7), it is necessary to provide a null model, defining P^. The standard 
choice for the null model constrains the degree distribution for the vertices to match the 
degree distribution in the actual network. Random graph models of this sort are obtained 
[30] by putting an edge between vertices / and j at random, with the constraint that on 
average the degree of any vertex i is <i ; . This constrains the expected adjacency matrix 
such that 

d i= E (x Ai i) ■ (8) 

Denote E (A ;; ) by and assume further that P factorizes into 

Pij = PiPj , (9) 

leading to 



d;d] 

p »=m ■ <10) 



A consequence of the null model choice is that Q = when all vertices are in the same 
community. 

The goal now is to find a division of the vertices into communities such that the mod- 
ularity Q is maximal. An exhaustive search for a decomposition is out of the question: 
even for moderately large graphs there are far too many ways to decompose them into 
communities. Fast approximate algorithms do exist (see, for example, Refs. [31, 32]). 



Finding Communities with BRIM 



Specific classes of networks have additional constraints that can be reflected in the 
null model. For bipartite graphs, the null model should be modified to reproduce the 



TABLE 1. R&D projects and participants in the EU 
Framework Programs. The projects and organizations are 
used to define bipartite networks for each of the Framework 
Programs. 



Framework 
Program 


Period 


Projects 


Organizations 


FP1 


1984-1987 


3,283 


1,981 


FP2 


1987-1981 


3,885 


4,572 


FP3 


1990-1994 


5,529 


7,324 


FP4 


1994-1998 


15,061 


19,755 


FP5 


1998-2002 


15,559 


22,303 



characteristic form of bipartite adjacency matrices 

A = 
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Recently, specialized modularity measures and search algorithms have been proposed 
for finding communities in bipartite networks [19, 20]. 

We make use of the algorithm called BRIM: bipartite, recursively induced modules 
[19]. Starting from a (more or less) ad hoc partition of the vertices of type 1, it is straight- 
forward to optimize a corresponding decomposition of the vertices of type 2. From there, 
optimize the decomposition of vertices of type 1, and iterate. In this fashion, modular- 
ity increases until a (local) maximum is reached. However, the question remains: is the 
maximum a "good" one? At this level then a random search is called for, varying the 
composition and number of communities, with the goal of reaching a better maximum 
after a new round of hill climbing using the BRIM algorithm. 



AN INITIAL LOOK AT EU RESEARCH NETWORKS 

In the ongoing research project NEMO [33], networks of research and development 
collaborations under EU Framework Programs FP1, FP2,. . . , FP5 are studied. The 
collaborations in the Framework Programs give rise to bipartite graphs, with edges 
existing between projects and the organizations which take part in them. With this 
construction, participating organizations are linked only through joint projects. 

In the various Framework Programs, the number of organizations ranges from 2,000 
to 20,000, the number of projects ranges from 3,000 to 15,000, and the number of links 
between them (project participations) ranges from 10,000 to 250,000 (see table 1 for 
precise values). A popular approach in social network analysis — where networks are 
often small, consisting of a few dozen nodes — is to visualize the networks and identify 
community groups by eye. However, the Framework Program networks are much larger: 
can we "see" the community groups in these networks? 

Structural differences or similarities of such networks are not obvious at a glance. For 
a graphical representation of the organizations and/or projects by dots on an A4 sheet of 



paper, we would have to put these dots at a distance of about 1 mm from each other, and 
we then still would not have drawn the links (collaborations) which connect them. 

Previous studies used a list of coarse graining recipes to compact the networks into 
a form which would lend itself to a graphical representation [34]. As an alternative 
we have attempted to detect communities just using BRIM, i.e., purely on the basis of 
relational network structure, and blind with respect to any additional information about 
the nature of agents. 

In Fig. 1, we show a community structure for FP3 found using the BRIM algorithm, 
with a modularity of Q = 0.602 for 14 community groups. The communities are shown 
as vertices in a network, with the vertex positions determined using spectral methods 
[35]. The area of each vertex is proportional to the number of edges from the original 
network within the corresponding community. The width of each edge in the community 
network is proportional to the number of edges in the original network connecting 
community members from the two linked groups. The vertices and edges are shaded 
to provide additional information about their topical structure, as described in the next 
section. Each community is labeled with the most frequently occurring subject index. 



Topical Profiles of Communities 

Projects are assigned one or more standardized subject indices. There are 49 subject 
indices in total, ranging from Aerospace to Waste Management. 
We denote by 

f(t)>0 (12) 
the frequency of occurrence of the subject index t in the network, with 

1/(0 = 1 . (13) 

t 

Similarly we consider the projects within one community c and the frequency 

fc(t)>0 (14) 

of any subject index t appearing in the projects only of that community. We call f c the 
topical profile of community c to be compared with that of the network as a whole. 

Topical differentiation of communities can be measured by comparing their profiles, 
among each other or with respect to the overall network. This can be done in a variety 
of ways [36], such as by the Kullback "distance" 

D c = £/ c (0ln^ • (15) 

A true metric is given by 

*=Ei/c(0-/(0i > 



ranging from zero to two. 



FIGURE 1. Community groups in the network of projects and organizations for FP3. 



Topical differentiation is illustrated in Fig. 2. In the figure, example profiles are shown, 
taken from the network in Fig. 1. The community- specific profile corresponds to the 
community labeled '11. Food" in Fig. 1. Based on the most frequently occurring sub- 
ject indices — Agriculture, Food, and Resources of the Seas, Fisheries — the community 
consists of projects and organizations focussed on R&D related to food products. The 
topical differentiation is d c = 0.90 for the community shown. 




FIGURE 2. Topical differentiation in a network community. The histogram shows the difference be- 
tween the topical profile f c (t) for a specific community (dark bars) and the overall profile f (t) for the 
network as a whole (light bars). The community-specific profile shown is for the community labeled 
"11. Food" in Fig. 1. The community has d c = 0.90. 

Projects and Subject Indices 

For further analysis, we have also looked for communities in networks of projects 
and subject indices. Here, the projects and subject indices constitute the vertices of a 
bipartite network, with edges existing between projects and the subject indices assigned 
to them. This construction disregards the organizations, providing an alternate approach 
to investigating the topical structure of the Framework Programs. 

In Fig. 3, we show a set of nine community groups identified for the network of 
projects and subject indices in FP5. The modularity is Q = 0.50290 with these commu- 
nity groups. 

The later Framework Programs, such as FP5, show a fair degree of overlap between 
the communities, due to the subject indices being freely assigned in project applications. 
This is in marked contrast to the networks for the first three Framework Programs, where 
topics were attributed rigidly within thematic subprograms. The communities for FP1-3 
thus have more clearly segregated community structures. FP1 is particularly extreme, 
having no overlaps between the communities. The differences between Framework 
Programs point to the need for some care in interpreting community structures: the 
communities in FP1-3 reflect policy structures while those found for later Framework 
Programs are more representative of interaction patterns. 



FIGURE 3. 
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Community groups in the network of projects and subject indices for FP5. 



Comparing Decompositions with the Mutual Information 

For different decompositions of a (finite) set Q, a measure of similarity is provided by 
the normalized mutual information [17]. Set 

PW = $ ■ ("> 

For divisions of Q. into 

1. J disjoint subsets Xi,...,Xj and 

2. K disjoint subsets Y\,...,Yk, 



compute the mutual information 



i (xj)=Y,p(XjnY k )\o g 



p(XjnY k ) 

p(Xj)p(Y k ) 



(18) 



and normalize 



I(X,Y) 



2I(X,Y) 



(19) 



H(X)+H(Y) 



where the entropy H is 



H(X) = - 1 £p(Xj)lo g p(Xj) 



(20) 



j 



With the definitions in eqs. (18) through (20), / ranges from zero, for uncorrected 
decompositions of the set, to one, for perfectly correlated decompositions. 

Using the BRIM algorithm, we have partitioned network vertices into community 
groups by maximizing the modularity Q. In principle, many dissimilar partitions of the 
vertices could produce similar modularity values. With the normalized mutual informa- 
tion (or a similar measure), we can assess the amount of shared structure between two 
different partitions of the vertex set. 

For example, in Fig. 3 we have shown a decomposition of the network of projects 
and subject indices for FP5. In Fig. 4, we show a second decomposition of the same 
network. The modularity is nearly identical — 0.50296 instead of the Q = 0.50290 seen 
previously — and the decompositions have some visible similarities. However, there are 
also definite structural differences, most prominently that the second decomposition 
has only eight communities while the first has nine. Are the structural differences 
significant? The normalized mutual information is found to be / = 0.98, indicating a 
strong correlation between the two decompositions and demonstrating that they have 
relatively minor structural differences. 



We have successfully identified community groups in networks defined from the 
Framework Program projects and the subject indices assigned to them. The full networks 
defined from the projects and the organizations taking part in them are considerably 
larger and correspondingly more challenging to investigate. For the full organizations- 
projects network, the BRIM hill climbing algorithm is being supplemented with an 
aggressive, probabilistic search through community configurations upon which BRIM 
acts. This extended search has only just begun, but preliminary results are encouraging. 

Once established, communities will then be investigated with regard to their internal 
structure, with the goal of identifying correlated properties within communities and 
contrasting properties across communities. We expect the analysis of internal structure 
to reveal patterns, themes, and motivations of collaborative research and development in 
the European Union. 



Outlook 
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